Boxplots#

This page contains instructions and documentation for creating plots used to visualize curve ensembles.

spaghetti_plot#

Plots a random selection of curves.

Parameters#

client (bigquery.Client): BigQuery client object.

table_name (str): BigQuery table name containing data in ‘dataset.table’ form.

reference_table (str): BigQuery table name containing reference table in ‘dataset.table’ form.

geo_level (str): The name of a column from the reference table. The geographical level used to determine what places are included.

geo_values (str or listlike or None): The source(s) to be included. A value or subset of values from the geo_level column. If None, then all values will be included.

geo_column (str, optional): Name of column in original table containing geography identifier. Defaults to ‘basin_id’.

reference_column (str, optional): Name of column in original table containing the geography corresponding to data in source_column and target_column. Defaults to ‘basin_id’.

value (str, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to ‘value’.

n (int, optional): Number of curves to plot. Defaults to 25.


Returns#

fig (plotly.graph_objects.Figure): Plotly Figure containing visualization.


Example#

import epidemic_intelligence as ei
from google.oauth2 import service_account
from google.cloud import bigquery

credentials = service_account.Credentials.from_service_account_file('../../../credentials.json') # use the path to your credentials
project = 'net-data-viz-handbook' # use your project name
# Initialize a GC client
client = bigquery.Client(credentials=credentials, project=project)

table_name = 'h1n1_R2.basins_prevalence_agg'
reference_table = 'reference.gleam-geo-map'
reference_column = 'basin_id' # name of a column in reference table
geo_column = 'basin_id' # name of a column in table corresponding to column in reference table
geo_level = 'basin_label' 
geo_values = 'Portland(US-ME)' 
value = 'Infectious_18_23'

sp_fig = ei.spaghetti_plot(
    client=client,
    table_name=table_name,
    reference_table=reference_table,
    geo_level=geo_level,
    geo_values=geo_values,
    geo_column=geo_column,
    reference_column=reference_column,
    value=value,
    n=100)

# finishing touches
sp_fig.update_layout(width=900, height=500, 
                     showlegend=True, 
                     font_family='PT Sans Narrow', 
                     title='Spaghetti Plot',)
sp_fig.show()

functional_boxplot#

A functional boxplot uses curve-based statistics that treat entire curves as a single data point, as opposed to each observation in a curve. Always plots the median and interquartile range.

Parameters#

client (bigquery.Client): BigQuery client object.

table_name (str): BigQuery table name containing data in ‘dataset.table’ form.

reference_table (str): BigQuery table name containing reference table in ‘dataset.table’ form.

geo_level (str): The name of a column from the reference table. The geographical level used to determine what places are included.

geo_values (str or listlike or None): The source(s) to be included. A value or subset of values from the geo_level column. If None, then all values will be included.

geo_column (str, optional): Name of column in original table containing geography identifier. Defaults to ‘basin_id’.

reference_column (str, optional): Name of column in original table containing the geography corresponding to data in source_column and target_column. Defaults to ‘basin_id’.

value (str, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to ‘value’.

num_clusters (int, optional): Number of clusters that curves will be broken into based on grouping_method. Defaults to 1. Note: raising num_clusters above one significantly increases runtime.

num_features (int, optional): Number of features the kmeans algorithm will use to group curves if num_clusters in greater than 1. Must be less than or equal to number of run_ids in table.

grouping_method (str, optional): Method used to group curves. Must be one of:

  • 'mse' (default): Fixed-time pairwise mean squared error between curves.

  • 'abc': Fixed-time pairwise area between curves. Also called mean absolute error.

kmeans_table (str, optional): BigQuery table name containing clustering information in ‘dataset.table’ form. Used when kmeans has already been performed with delete_data=False. Allows function to skip costly kmeans algorithm.

centrality_method (str, optional): Method used to determine curve centrality within their group. Must be one of:

  • 'mse' (default): Summed fixed-time mean squared error between curves.

  • 'abc': Summed fixed-time pairwise area between curves. Also called mean absolute error.

  • 'mbd': Modified band depth. For more information, see Sun and Genton (2011).

threshold (float, optional): Number of interquantile ranges from median curve must be to not be considered an outlier. Defaults to 1.5.

dataset (str or None, optional): Name of BigQuery dataset to store intermediate tables. If None, then random hash value will be used. Defaults to None.

delete_data (bool, optional): If True, then intermediate data tables will not be deleted. Defaults to False.

overwrite (bool, optional): If True, then will not prompt for confirmation if overwriting an existing BigQuery dataset. Defaults to False.


Returns#

fig (plotly.graph_objects.Figure): Plotly Figure containing visualization.


Example#

# required
table_name = 'h1n1_R2.basins_prevalence_agg'
reference_table = 'reference.gleam-geo-map'
reference_column = 'basin_id' # name of a column in reference table
geo_column = 'basin_id' # name of a column in table corresponding to column in reference table
geo_level = 'basin_label'
geo_values = 'Portland(US-ME)'
value = 'Infectious_18_23'

# Set parameters for grouping
num_clusters = 1
num_features = 20 
grouping_method = 'mse' # mean squared error
centrality_method = 'mse' # mean squared error

dataset = None
delete_data = True

fbp_fig = ei.functional_boxplot(
    client=client,
    table_name=table_name,
    reference_table=reference_table,
    geo_level=geo_level,
    geo_values=geo_values,
    geo_column=geo_column,
    reference_column=reference_column,
    value=value,
    num_clusters=num_clusters,
    num_features=num_features,
    grouping_method=grouping_method,
    centrality_method=centrality_method,
    dataset=dataset,
    delete_data=delete_data,
    overwrite=True
)

# finishing touches
fbp_fig.update_layout(width=900, height=500, 
                     showlegend=True, 
                     font_family='PT Sans Narrow', 
                     title='Functional Boxplot',
                     yaxis_title="Infectious 18-23yo"
)
fbp_fig.show()
Dataset `net-data-viz-handbook.717fb4a83d0d2c670816798c0103232eeb31674b262db45377e495c89164ffd1` created.
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[2], line 19
     16 dataset = None
     17 delete_data = True
---> 19 fbp_fig = ei.functional_boxplot(
     20     client=client,
     21     table_name=table_name,
     22     reference_table=reference_table,
     23     geo_level=geo_level,
     24     geo_values=geo_values,
     25     geo_column=geo_column,
     26     reference_column=reference_column,
     27     value=value,
     28     num_clusters=num_clusters,
     29     num_features=num_features,
     30     grouping_method=grouping_method,
     31     centrality_method=centrality_method,
     32     dataset=dataset,
     33     delete_data=delete_data,
     34     overwrite=True
     35 )
     37 # finishing touches
     38 fbp_fig.update_layout(width=900, height=500, 
     39                      showlegend=True, 
     40                      font_family='PT Sans Narrow', 
     41                      title='Functional Boxplot',
     42                      yaxis_title="Infectious 18-23yo"
     43 )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\epidemic_intelligence\boxplots.py:338, in functional_boxplot(client, table_name, reference_table, geo_level, geo_values, geo_column, reference_column, value, num_clusters, num_features, grouping_method, kmeans_table, centrality_method, threshold, dataset, delete_data, overwrite)
    318 # Step 8
    319 get_mid_curves = f"""-- Step 3: Calculate min and max values at each time step using the non-outliers table
    320     SELECT
    321         data.date,
   (...)
    336         date;
    337     """
--> 338 plt_middle = client.query(get_mid_curves).to_dataframe()  # Execute and fetch results
    340 get_outliers = f"""-- Step 3: Calculate min and max values at each time step using the non-outliers table
    341     WITH outliers AS (
    342         SELECT 
   (...)
    365         date;
    366     """
    367 plt_outliers = client.query(get_outliers).to_dataframe()  # Execute and fetch results

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\bigquery\client.py:3492, in Client.query(self, query, job_config, job_id, job_id_prefix, location, project, retry, timeout, job_retry, api_method)
   3481     return _job_helpers.query_jobs_query(
   3482         self,
   3483         query,
   (...)
   3489         job_retry,
   3490     )
   3491 elif api_method == enums.QueryApiMethod.INSERT:
-> 3492     return _job_helpers.query_jobs_insert(
   3493         self,
   3494         query,
   3495         job_config,
   3496         job_id,
   3497         job_id_prefix,
   3498         location,
   3499         project,
   3500         retry,
   3501         timeout,
   3502         job_retry,
   3503     )
   3504 else:
   3505     raise ValueError(f"Got unexpected value for api_method: {repr(api_method)}")

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\bigquery\_job_helpers.py:159, in query_jobs_insert(client, query, job_config, job_id, job_id_prefix, location, project, retry, timeout, job_retry)
    156     else:
    157         return query_job
--> 159 future = do_query()
    160 # The future might be in a failed state now, but if it's
    161 # unrecoverable, we'll find out when we ask for it's result, at which
    162 # point, we may retry.
    163 if not job_id_given:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\bigquery\_job_helpers.py:136, in query_jobs_insert.<locals>.do_query()
    133 query_job = job.QueryJob(job_ref, query, client=client, job_config=job_config)
    135 try:
--> 136     query_job._begin(retry=retry, timeout=timeout)
    137 except core_exceptions.Conflict as create_exc:
    138     # The thought is if someone is providing their own job IDs and they get
    139     # their job ID generation wrong, this could end up returning results for
    140     # the wrong query. We thus only try to recover if job ID was not given.
    141     if job_id_given:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\bigquery\job\query.py:1383, in QueryJob._begin(self, client, retry, timeout)
   1363 """API call:  begin the job via a POST request
   1364 
   1365 See
   (...)
   1379     ValueError: If the job has already begun.
   1380 """
   1382 try:
-> 1383     super(QueryJob, self)._begin(client=client, retry=retry, timeout=timeout)
   1384 except exceptions.GoogleAPICallError as exc:
   1385     exc.message = _EXCEPTION_FOOTER_TEMPLATE.format(
   1386         message=exc.message, location=self.location, job_id=self.job_id
   1387     )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\bigquery\job\base.py:746, in _AsyncJob._begin(self, client, retry, timeout)
    743 # jobs.insert is idempotent because we ensure that every new
    744 # job has an ID.
    745 span_attributes = {"path": path}
--> 746 api_response = client._call_api(
    747     retry,
    748     span_name="BigQuery.job.begin",
    749     span_attributes=span_attributes,
    750     job_ref=self,
    751     method="POST",
    752     path=path,
    753     data=self.to_api_repr(),
    754     timeout=timeout,
    755 )
    756 self._set_properties(api_response)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\bigquery\client.py:833, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
    829 if span_name is not None:
    830     with create_span(
    831         name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
    832     ):
--> 833         return call()
    835 return call()

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\api_core\retry\retry_unary.py:293, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
    289 target = functools.partial(func, *args, **kwargs)
    290 sleep_generator = exponential_sleep_generator(
    291     self._initial, self._maximum, multiplier=self._multiplier
    292 )
--> 293 return retry_target(
    294     target,
    295     self._predicate,
    296     sleep_generator,
    297     timeout=self._timeout,
    298     on_error=on_error,
    299 )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\api_core\retry\retry_unary.py:144, in retry_target(target, predicate, sleep_generator, timeout, on_error, exception_factory, **kwargs)
    142 for sleep in sleep_generator:
    143     try:
--> 144         result = target()
    145         if inspect.isawaitable(result):
    146             warnings.warn(_ASYNC_RETRY_WARNING)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\_http\__init__.py:482, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)
    479     data = json.dumps(data)
    480     content_type = "application/json"
--> 482 response = self._make_request(
    483     method=method,
    484     url=url,
    485     data=data,
    486     content_type=content_type,
    487     headers=headers,
    488     target_object=_target_object,
    489     timeout=timeout,
    490     extra_api_info=extra_api_info,
    491 )
    493 if not 200 <= response.status_code < 300:
    494     raise exceptions.from_http_response(response)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\_http\__init__.py:341, in JSONConnection._make_request(self, method, url, data, content_type, headers, target_object, timeout, extra_api_info)
    338     headers[CLIENT_INFO_HEADER] = self.user_agent
    339 headers["User-Agent"] = self.user_agent
--> 341 return self._do_request(
    342     method, url, headers, data, target_object, timeout=timeout
    343 )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\cloud\_http\__init__.py:379, in JSONConnection._do_request(self, method, url, headers, data, target_object, timeout)
    345 def _do_request(
    346     self, method, url, headers, data, target_object, timeout=_DEFAULT_TIMEOUT
    347 ):  # pylint: disable=unused-argument
    348     """Low-level helper:  perform the actual API request over HTTP.
    349 
    350     Allows batch context managers to override and defer a request.
   (...)
    377     :returns: The HTTP response.
    378     """
--> 379     return self.http.request(
    380         url=url, method=method, headers=headers, data=data, timeout=timeout
    381     )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\google\auth\transport\requests.py:538, in AuthorizedSession.request(self, method, url, data, headers, max_allowed_time, timeout, **kwargs)
    535 remaining_time = guard.remaining_timeout
    537 with TimeoutGuard(remaining_time) as guard:
--> 538     response = super(AuthorizedSession, self).request(
    539         method,
    540         url,
    541         data=data,
    542         headers=request_headers,
    543         timeout=timeout,
    544         **kwargs
    545     )
    546 remaining_time = guard.remaining_timeout
    548 # If the response indicated that the credentials needed to be
    549 # refreshed, then refresh the credentials and re-attempt the
    550 # request.
    551 # A stored token may expire between the time it is retrieved and
    552 # the time the request is made, so we may need to try twice.

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\requests\sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    584 send_kwargs = {
    585     "timeout": timeout,
    586     "allow_redirects": allow_redirects,
    587 }
    588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
    591 return resp

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\requests\sessions.py:703, in Session.send(self, request, **kwargs)
    700 start = preferred_clock()
    702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
    705 # Total elapsed time of the request (approximately)
    706 elapsed = preferred_clock() - start

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\requests\adapters.py:486, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    483     timeout = TimeoutSauce(connect=timeout, read=timeout)
    485 try:
--> 486     resp = conn.urlopen(
    487         method=request.method,
    488         url=url,
    489         body=request.body,
    490         headers=request.headers,
    491         redirect=False,
    492         assert_same_host=False,
    493         preload_content=False,
    494         decode_content=False,
    495         retries=self.max_retries,
    496         timeout=timeout,
    497         chunked=chunked,
    498     )
    500 except (ProtocolError, OSError) as err:
    501     raise ConnectionError(err, request=request)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\urllib3\connectionpool.py:790, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    787 response_conn = conn if not release_conn else None
    789 # Make the request on the HTTPConnection object
--> 790 response = self._make_request(
    791     conn,
    792     method,
    793     url,
    794     timeout=timeout_obj,
    795     body=body,
    796     headers=headers,
    797     chunked=chunked,
    798     retries=retries,
    799     response_conn=response_conn,
    800     preload_content=preload_content,
    801     decode_content=decode_content,
    802     **response_kw,
    803 )
    805 # Everything went great!
    806 clean_exit = True

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\urllib3\connectionpool.py:536, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    534 # Receive the response from the server
    535 try:
--> 536     response = conn.getresponse()
    537 except (BaseSSLError, OSError) as e:
    538     self._raise_timeout(err=e, url=url, timeout_value=read_timeout)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\urllib3\connection.py:461, in HTTPConnection.getresponse(self)
    458 from .response import HTTPResponse
    460 # Get the response from http.client.HTTPConnection
--> 461 httplib_response = super().getresponse()
    463 try:
    464     assert_header_parsing(httplib_response.msg)

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\http\client.py:1395, in HTTPConnection.getresponse(self)
   1393 try:
   1394     try:
-> 1395         response.begin()
   1396     except ConnectionError:
   1397         self.close()

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\http\client.py:325, in HTTPResponse.begin(self)
    323 # read until we get a non-100 response
    324 while True:
--> 325     version, status, reason = self._read_status()
    326     if status != CONTINUE:
    327         break

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\http\client.py:286, in HTTPResponse._read_status(self)
    285 def _read_status(self):
--> 286     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    287     if len(line) > _MAXLINE:
    288         raise LineTooLong("status line")

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\socket.py:706, in SocketIO.readinto(self, b)
    704 while True:
    705     try:
--> 706         return self._sock.recv_into(b)
    707     except timeout:
    708         self._timeout_occurred = True

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\ssl.py:1314, in SSLSocket.recv_into(self, buffer, nbytes, flags)
   1310     if flags != 0:
   1311         raise ValueError(
   1312           "non-zero flags not allowed in calls to recv_into() on %s" %
   1313           self.__class__)
-> 1314     return self.read(nbytes, buffer)
   1315 else:
   1316     return super().recv_into(buffer, nbytes, flags)

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\ssl.py:1166, in SSLSocket.read(self, len, buffer)
   1164 try:
   1165     if buffer is not None:
-> 1166         return self._sslobj.read(len, buffer)
   1167     else:
   1168         return self._sslobj.read(len)

KeyboardInterrupt: 

fixed_time_boxplot#

A fixted-time boxplot uses fixed-time statistics that rank each point at each time step, and use those to construct confidence intervals for each time step. Always plots the median and interquartile range.

Parameters#

client (bigquery.Client): BigQuery client object.

table_name (str): BigQuery table name containing data in ‘dataset.table’ form.

reference_table (str): BigQuery table name containing reference table in ‘dataset.table’ form.

geo_level (str): The name of a column from the reference table. The geographical level used to determine what places are included.

geo_values (str or listlike or None): The source(s) to be included. A value or subset of values from the geo_level column. If None, then all values will be included.

geo_column (str, optional): Name of column in original table containing geography identifier. Defaults to ‘basin_id’.

reference_column (str, optional): Name of column in original table containing the geography corresponding to data in source_column and target_column. Defaults to ‘basin_id’.

value (str, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to ‘value’.

num_clusters (int, optional): Number of clusters that curves will be broken into based on grouping_method. Defaults to 1. Note: raising num_clusters above one significantly increases runtime.

num_features (int, optional): Number of features the kmeans algorithm will use to group curves if num_clusters in greater than 1. Must be less than or equal to number of run_ids in table.

grouping_method (str, optional): Method used to group curves. Must be one of:

  • 'mse' (default): Fixed-time pairwise mean squared error between curves.

  • 'abc': Fixed-time pairwise area between curves. Also called mean absolute error.

kmeans_table (str, optional): BigQuery table name containing clustering information in ‘dataset.table’ form. Used when kmeans has already been performed with delete_data=False. Allows function to skip costly kmeans algorithm.

dataset (str or None, optional): Name of BigQuery dataset to store intermediate tables. If None, then random hash value will be used. Defaults to None.

delete_data (bool, optional): If True, then intermediate data tables will not be deleted. Defaults to False.

overwrite (bool, optional): If True, then will not prompt for confirmation if overwriting an existing BigQuery dataset. Defaults to False.

confidence (float, optional): From 0 to 1. Confidence level of interval that will be graphed. Also determines which points are considered outliers.

full_range (bool, optional): If True, then mesh will be drawn around entire envelope, including outliers. Defaults to False.

outlying_points (bool, optional): If True, then outlying points will be graphed. Defaults to True.


Returns#

fig (plotly.graph_objects.Figure): Plotly Figure containing visualization.


Example#

# required
table_name = 'h1n1_R2.basins_prevalence_agg'
reference_table = 'reference.gleam-geo-map'
reference_column = 'basin_id' # name of a column in reference table
geo_column = 'basin_id' # name of a column in table corresponding to column in reference table
geo_level = 'basin_label'
geo_values = 'Portland(US-ME)'
value = 'Infectious_18_23'

# Set parameters for grouping
num_clusters = 1
num_features = 20 
grouping_method = 'mse' # mean squared error
confidence = .95

dataset = None
delete_data = True

ft_fig = ei.fixed_time_boxplot(
    client,
    table_name,
    reference_table,
    geo_level,
    geo_values,
    geo_column=geo_column,
    reference_column=reference_column,
    num_clusters=num_clusters,
    num_features=num_features,
    grouping_method=grouping_method,
    value=value,
    dataset=dataset,
    delete_data=delete_data,
    kmeans_table=False,
    confidence=confidence,
    full_range=True,
    outlying_points=False,
)

# finishing touches
ft_fig.update_layout(width=900, height=500, 
                     showlegend=True, 
                     font_family='PT Sans Narrow', 
                     title='Traditional Boxplot',)
ft_fig.update_layout(showlegend=True)
Dataset `net-data-viz-handbook.078fee40a39ef9a8912e762a6e6dc113d11d3fb7f2be58ac319e13c9b90f900a` created.
BigQuery dataset `net-data-viz-handbook.078fee40a39ef9a8912e762a6e6dc113d11d3fb7f2be58ac319e13c9b90f900a` removed successfully, or it did not exist.

fetch_fixed_time_quantiles#

Allows calculation of custom fixed-time quantiles. Always fetches median.

Parameters#

client (bigquery.Client): BigQuery client object.

table_name (str): BigQuery table name containing data in ‘dataset.table’ form.

reference_table (str): BigQuery table name containing reference table in ‘dataset.table’ form.

confidences (list of float): List of confidences to gather, from 0 to 1. For example, entering .5 will result in the 25th and 75th percentiles being calculated.

geo_level (str): The name of a column from the reference table. The geographical level used to determine what places are included.

geo_values (str or listlike or None): The geographies to be included. A value or subset of values from the geo_level column. If None, then all values will be included.

geo_column (str, optional): Name of column in original table containing geography identifier. Defaults to ‘basin_id’.

reference_column (str, optional): Name of column in original table containing the geography corresponding to data in source_column and target_column. Defaults to ‘basin_id’.

value (str, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to ‘value’.

num_clusters (int, optional): Number of clusters that curves will be broken into based on grouping_method. Defaults to 1. Note: raising num_clusters above one significantly increases runtime.

num_features (int, optional): Number of features the kmeans algorithm will use to group curves if num_clusters in greater than 1. Must be less than or equal to number of run_ids in table.

grouping_method (str, optional): Method used to group curves. Must be one of:

  • 'mse' (default): Fixed-time pairwise mean squared error between curves.

  • 'abc': Fixed-time pairwise area between curves. Also called mean absolute error.

kmeans_table (str, optional): BigQuery table name containing clustering information in ‘dataset.table’ form. Used when kmeans has already been performed with delete_data=False. Allows function to skip costly kmeans algorithm.

dataset (str or None, optional): Name of BigQuery dataset to store intermediate tables. If None, then random hash value will be used. Defaults to None.

delete_data (bool, optional): If True, then intermediate data tables will not be deleted. Defaults to False.

overwrite (bool, optional): If True, then will not prompt for confirmation if overwriting an existing BigQuery dataset. Defaults to False.


Returns#

df (pandas.DataFrame): pandas dataframe containing quantiles and median.


Example#

# uses the same parameters as fixed_time_boxplot!
df_ft = ei.boxplots.fetch_fixed_time_quantiles(
    client=client,
    table_name=table_name,
    reference_table=reference_table,
    confidences=[.9, .5], # just introduce the confidences parameter
    geo_level=geo_level,
    geo_values=geo_values,
    geo_column=geo_column,
    reference_column=reference_column,
    num_clusters=num_clusters,
    num_features=num_features,
    grouping_method=grouping_method,
    value=value,
    dataset=dataset,
    delete_data=delete_data,
    kmeans_table=False,
)

df_ft
Dataset `net-data-viz-handbook.ab7c684e00be1a0a79bf20da8828ed72fe66bf1ce58fb99fd4ba1a06a0ade3aa` created.
---------------------------------------------------------------------------
BadRequest                                Traceback (most recent call last)
Cell In[21], line 2
      1 # uses the same parameters as fixed_time_boxplot!
----> 2 df_ft = ei.boxplots.fetch_fixed_time_quantiles(
      3     client=client,
      4     table_name=table_name,
      5     reference_table=reference_table,
      6     confidences=[.9, .5], # just introduce the confidences parameter
      7     geo_level=geo_level,
      8     geo_values=geo_values,
      9     geo_column=geo_column,
     10     reference_column=reference_column,
     11     num_clusters=num_clusters,
     12     num_features=num_features,
     13     grouping_method=grouping_method,
     14     value=value,
     15     dataset=dataset,
     16     delete_data=delete_data,
     17     kmeans_table=False,
     18 )
     20 df_ft

File ~\Documents\24f-coop\demovenv\Lib\site-packages\epidemic_intelligence\boxplots.py:967, in fetch_fixed_time_quantiles(client, table_name, reference_table, geo_level, geo_values, confidences, geo_column, reference_column, num_clusters, num_features, grouping_method, value, dataset, delete_data, kmeans_table, overwrite)
    946     # print(', '.join(clause for clause in conf_clause))
    948 fixed_time_quantiles = f'''
    949     WITH centroid_data AS (
    950         SELECT 
   (...)
    964     ORDER BY CENTROID_ID, date;
    965     '''
--> 967 df = client.query(fixed_time_quantiles).result().to_dataframe()  # Execute the query to create the table
    969 if delete_data:
    970     client.delete_dataset(
    971         dataset,
    972         delete_contents=True,  # Set to False if you only want to delete an empty dataset
    973         not_found_ok=True      # If True, no error is raised if the dataset does not exist
    974     )

File ~\Documents\24f-coop\demovenv\Lib\site-packages\google\cloud\bigquery\job\query.py:1681, in QueryJob.result(self, page_size, max_results, retry, timeout, start_index, job_retry)
   1676     remaining_timeout = None
   1678 if remaining_timeout is None:
   1679     # Since is_job_done() calls jobs.getQueryResults, which is a
   1680     # long-running API, don't delay the next request at all.
-> 1681     while not is_job_done():
   1682         pass
   1683 else:
   1684     # Use a monotonic clock since we don't actually care about
   1685     # daylight savings or similar, just the elapsed time.

File ~\Documents\24f-coop\demovenv\Lib\site-packages\google\api_core\retry\retry_unary.py:293, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
    289 target = functools.partial(func, *args, **kwargs)
    290 sleep_generator = exponential_sleep_generator(
    291     self._initial, self._maximum, multiplier=self._multiplier
    292 )
--> 293 return retry_target(
    294     target,
    295     self._predicate,
    296     sleep_generator,
    297     timeout=self._timeout,
    298     on_error=on_error,
    299 )

File ~\Documents\24f-coop\demovenv\Lib\site-packages\google\api_core\retry\retry_unary.py:153, in retry_target(target, predicate, sleep_generator, timeout, on_error, exception_factory, **kwargs)
    149 # pylint: disable=broad-except
    150 # This function explicitly must deal with broad exceptions.
    151 except Exception as exc:
    152     # defer to shared logic for handling errors
--> 153     _retry_error_helper(
    154         exc,
    155         deadline,
    156         sleep,
    157         error_list,
    158         predicate,
    159         on_error,
    160         exception_factory,
    161         timeout,
    162     )
    163     # if exception not raised, sleep before next attempt
    164     time.sleep(sleep)

File ~\Documents\24f-coop\demovenv\Lib\site-packages\google\api_core\retry\retry_base.py:212, in _retry_error_helper(exc, deadline, next_sleep, error_list, predicate_fn, on_error_fn, exc_factory_fn, original_timeout)
    206 if not predicate_fn(exc):
    207     final_exc, source_exc = exc_factory_fn(
    208         error_list,
    209         RetryFailureReason.NON_RETRYABLE_ERROR,
    210         original_timeout,
    211     )
--> 212     raise final_exc from source_exc
    213 if on_error_fn is not None:
    214     on_error_fn(exc)

File ~\Documents\24f-coop\demovenv\Lib\site-packages\google\api_core\retry\retry_unary.py:144, in retry_target(target, predicate, sleep_generator, timeout, on_error, exception_factory, **kwargs)
    142 for sleep in sleep_generator:
    143     try:
--> 144         result = target()
    145         if inspect.isawaitable(result):
    146             warnings.warn(_ASYNC_RETRY_WARNING)

File ~\Documents\24f-coop\demovenv\Lib\site-packages\google\cloud\bigquery\job\query.py:1630, in QueryJob.result.<locals>.is_job_done()
   1607 if job_failed_exception is not None:
   1608     # Only try to restart the query job if the job failed for
   1609     # a retriable reason. For example, don't restart the query
   (...)
   1627     # into an exception that can be processed by the
   1628     # `job_retry` predicate.
   1629     restart_query_job = True
-> 1630     raise job_failed_exception
   1631 else:
   1632     # Make sure that the _query_results are cached so we
   1633     # can return a complete RowIterator.
   (...)
   1639     # making any extra API calls if the previous loop
   1640     # iteration fetched the finished job.
   1641     self._reload_query_results(
   1642         retry=retry, **reload_query_results_kwargs
   1643     )

BadRequest: 400 Unrecognized name: DISINCT at [11:16]; reason: invalidQuery, location: query, message: Unrecognized name: DISINCT at [11:16]

Location: US
Job ID: 53cab13b-8c0f-4960-affe-9c835ea20513